Maximum Likelihood Estimation of Drift and Diffusion Functions 



O 

o 

(N 



(N 
(N 



C ■ 

a ; 

ctf ■ 
+-> • 
ctf ■ 

O ■ 

• i-H . 

C/3 . 
>v 
43 ■ 



(N 
> 
(N 
O 



vo : 
o . 

o : 

>>: 
43 : 

Oh. 



X 



David Kleinhans and Rudolf Friedrich 

Westfalische Wilhelms-Universitat Miinster, D-48149 Miinster, Germany 
(Dated: February 2, 2008) 

The maximum likelihood approach is adapted to the problem of estimation of drift and diffusion 
functions of stochastic processes from measured time series. We reconcile a previously devised 
iterative procedure [1] and put the application of the method on a firm theoretical basis. 
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I. INTRODUCTION 

Complex systems of physics, chemistry, and biology 
are composed of a huge number of microscopic subsys- 
tems interacting on a fast time scale. Self organized be- 
haviour may arise on a macroscopic length and time scale 
which can be described by suitably defined order parame- 
ters. The microscopic degree's of freedom, however, show 
up in terms of fast temporal variations which effectively 
can be treated as random fluctuations [2] . The adequate 
description of such systems viewed from a macroscopic 
perspective are Langevin equations, which contain a de- 
terministic part described by the drift vector and a fluc- 
tuating part whose impact on the dynamics is quantified 
by a diffusion matrix [3, 4]. 

Recently, a procedure has been proposed that allows 
for a direct estimation of these quantities and, hence, of 
the stochastic dynamics from measured data [5]. This 
procedure has provided a deeper insight to a broad class 
of systems, especially in the field of life sciences [5-7]. 
Moreover, also turbulence research has greatly benefited 
from this procedure [8]. 

However, the procedure is based on the estimation of 
conditional moments in the limit of high sampling fre- 
quencies, 

D<*>(x)= lim -(\x(t + T ) -x(t)] k \x(t) = 

(1) 

for k — 1 and k = 2, respectively. D^\x) is the drift 
vector, while D^> (x) exhibits the diffusion matrix of the 
underlying process at position x. The limiting procedure 
(1) can be problematic in case of a finite time resolution 
of measured data. Moreover, any presence of measure- 
ment or discretization noise seriously interferes with the 
convergence of the limiting procedure [19]. 

Recently, we proposed an iterative method that cir- 
cumvents this limiting procedure [1]. It is based on the 
minimization of the Kullback-Leibler distance [9, 10] be- 
tween the two time joint probability distribution func- 
tions (pdf) obtained from the data and the simulated 
process for a certain set of parameters, respectively. The 
starting configuration of this iterative procedure as well 
as a suitable parametrization of drift and diffusion func- 
tions can be obtained by the direct estimates based on 
the smallest reliable time increment r, provided by (1). 

On the other hand, the analysis of discrete stochastic 



processes by means of maximum likelihood-methods has 
made great progress in recent years: Since it has become 
evident, that the maximisation of the likelihood function 
is a powerful tool for the analysis of Markovian time series 
[11], several methods have been proposed to optimise the 
calculation of the required conditional transition pdfs [12, 
13]. For a recent study on the preferences of current 
methods we refer to [14]. 

The intention of the present note is to derive a 
maximum likelihood estimator for parameters of the 
parametrized drift vector and diffusion matrix, that 
purely is based on the conditional and joint transition 
pdfs of the dataset under consideration. By this means, 
the Kullback-Leibler estimator reappears in case of an en- 
semble of individual measurements - but now physically 
well motivated. We want to point out, that the evalua- 
tion of a specific parametrisation by means of the min- 
imization of the Kullback-Leibler estimator yields great 
advantages, since this function is bounded from below by 
the value 0. Hence, the goodness of a single parametri- 
sation can be assessed. 

Moreover, with respect to our previous treatment [1], 
a simplified maximum likelihood estimator is introduced 
for the analysis of nonlinear time series exhibiting Marko- 
vian properties. This estimator leads to a reasonable re- 
duction of the required computational effort compared to 
a direct application of the former method and is proposed 
for future application in nonlinear time series analysis. 
Accurate results can be obtained even in the case of few 
or sparsely sampled measurement data. However, the 
relevance of the results obtained in the case of data sets 
involving few data points carefully has to be reconsidered 
in a self consistent manner. 



II. MAXIMUM LIKELIHOOD ESTIMATION ON 
ENSEMBLES: RECONCILIATION WITH THE 
KULLBACK-LEIBLER ESTIMATOR [1] 

We consider time series x(to), . . . ,x(t n ), fj < t. L+ i of 
n recordings of a multivariate stochastic variable. Fur- 
thermore, we assume that the time lag between consec- 
utive observations is r. Henceforth, the abbreviation 
Xi := x(to + it) will be used. 

In this section, the estimation of drift and diffusion 
functions from an ensemble of N independent time series 
is considered. Such data sets generally are obtained from 
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measurements on an ensemble of N independent systems, 
that are performed simultaneously. In this vein, the time 
evolution of the stochastic properties can be analysed. 
For the present case, we restrict ourselves without loss 
of generality to the analysis of the first two consecutive 
measurements x^ and x\ with fee [1, iV] . 

By means of the direct estimation described in [5] , drift 
and diffusion functions can be estimated from data from 
the Kramers-Moyal expansion coefficients (1). On the 
basis of this estimate, models for the drift and diffusion 
function, respectively, can be constructed, depending on 
a set of parameters, A. This procedure is described in 
greater detail in [1]. 

The likelihood of the current realization for one specific 
set of parameters, A, can be expressed by means of the 
joint pdf, 

P (a?j, Xq, Sj, Xq, . . . , iEj , Xq \A) . (2) 

Since the individual N processes are assumed to be sta- 
tistically independent of one another, this joint pdf de- 
generates into a product of two point joint pdfs, 

P(x 1 1 ,x 1 \A)P(xl,x%\A)x...xP(x?,xZ\A) . (3) 

This expression can be simplified considerably. 

First, we consider the logarithm of (3), usually called 
log- likelihood function [15], 



N 



£log [P(x*,x*\A) 



(4) 



fe=i 



With help of p(x,x') := jfJ2k=i S ( x ~ x\)S{x' - x%) 
expression (4) finally can be evaluated by means of an 
integral, 

(5) 



lOg [P (Xj, Kg, lE-p Xq, . 



• ■ J X \ 1 X Q I A) 



N 



dx j dx' p(x,x'\r) log [P(x, x'|A)] 



Since the logarithm is a monotonically increasing func- 
tion, the maximization of the likelihood function is equiv- 
alent to the maximization of its logarithm. The set A, 
that maximizes the latter expression, therefore forms 
the most likely set of parameters under the current 
parametrization. 

In [1], for the present case the minimization of the 
Kullback distance K[A] of the joint distributions has 
been proposed, 



k[A] 



dx / dx' p(x, x') log 



p{x,x') 



P(x,x'\A 
J dx J dx' p(x, x') log \p(x, x')] 

dx j dx' p(x,x')log[P(x,x'\A)] (6c) 



(6a) 
(6b) 



The term (6b) is independent of the individual set A, 
while (6c) is conform to (5). Therefore, minimization of 
K[A] evidently is equivalent to the maximization of the 
likelihood of the set of parameters A. 



III. MAXIMUM LIKELIHOOD ESTIMATION 
ON MARKOVIAN TIME SERIES 

Henceforth, individual time series xo, . . . , x n are con- 
sidered. We assume that the time lag between consecu- 
tive observations is r and that the process is stationary 
in a sense, that the statistical properties are conserved 
during the measurement period. 

Let us further assume, that the data set under con- 
sideration exhibits Markovian properties. This can be 
verified by means of the Chapman-Kolmogorov equation 
[3, 4], 



/ 



P(xi\xi- 2 ) = I dxi- 1 P(xi\x i - 1 )P(xi- 1 \xi- 2 ) , 

(7) 

that can be evaluated numerically. Although this condi- 
tion is not sufficient, it seems to be a very robust crite- 
rion. If Markovian properties arc not fulfilled, an increase 
of the number of observables by means of a delay embed- 
ding of the data may help to fulfil this constraint, if the 
amount of data is sufficiently high for such an procedure 
[3]. 

If the process under consideration is ergodic, time av- 
erages can be evaluated by means of ensemble averages. 
Then, also in this case a reasonable parametrization and 
initial condition for the vector A can be obtained by the 
direct evaluation of (1), as described in the previous sec- 
tion. Let us now iterate the arguments of the previous 
section. 

The likelihood of the current realization for a specific 
set of parameters, A, is 



P(x n , . . .,x \A) 



(8) 



Since we assume Markov properties, this joint pdf degen- 
erates into a product of two point conditional pdfs, 

P(aj„|a; n _i, A) x ... x P(x 1 \x ,A)P(x \A) . (9) 

This expression can be simplified by considering the log- 
arithm of the likelihood function. With the help of the 
definition 



p(x, x') := - S(x - Xi)5(x' - Xi-i) (10) 



we finally obtain: 

log [P(x n ,...,x \A)] 
= log [P(x \A)} 

+n dx dx' p(x, x'\t) log [P(aj|aj', A)] 



(11) 



Following the maximum likelihood approach, this expres- 
sion has to be maximized with respect to A. This is 
consistent with the minimization of 



1 



K'[A] = —log[P(x \A)} 



dx / dx' p{x, x') log 



(12) 



p(x\x' 



P(x\x',A) 



3 



It is obvious, that in the latter expression the first sum- 
mand is negligible for n 3> 1 . Even in the case of smaller 
n, the first measurement in some cases may not obey the 
stationary distribution due to transient processes of the 
measurement. On the other hand, the evaluation of the 
expression may be time-consuming since the stationary 
distribution of the process is required. In conclusion, we 
propose to solely perform the minimization of 



K[A] = J dx J dx' p(x, x') log 



p(x\x') 



P(x\x',A) 



■ (13) 



dt J 



IV. MINIMIZATION PROCEDURE FOR 
DRIFT- /DIFFUSION-PROCESSES 

We would like to emphasize, that expression (13) can 
be evaluated numerically. It is a feature of drift and diffu- 
sion processes, that the time evolution of the conditional 
pdf can be obtained from the Fokker-Planck equation [3] , 

^WI^o)] = {-E^A (1) (*) (14) 

+ J2AjD ( i f(x))p[x(t)\x'(t )} . 

This equation can be treated efficiently by implicit al- 
gorithms at least for the case x G R and x e R 2 , re- 
spectively [16]. Moreover, kernel density estimates based 
on numerical integration of the associated stochastic dif- 
ferential equation can be applied, that are described in 
greater detail in [14]. 

The data under consideration can be reduced signifi- 
cantly by a suitable discretization of data space in sev- 
eral bins. Typically, this grid should coincidence with 
the spatial discretization required for numerical solution 
of the Fokker-Planck equation. After discretization and 
numerical evaluation of the expression P(x\x', A), equa- 
tion (13) can be evaluated my means of a finite sum. 

Eventually, the set A, that minimizes (13) has to be 
investigated. This can be done by gradient method or 
more efficient approaches [16]. We want to emphasize 
that in the majority of cases a suitable starting value is 
obtained from the initial estimates (1). This is essential 
for a successful and fast minimization by any numerical 
algorithm. 
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FIG. 1: Detail of a sample process specified by equations (15) 
with D = 1.25 and 7 = .75. 



that in the long term limit obeys a lognormal distribution 

2 "\ 

. (15b) 



log 



2^ 

7 



This complies with the drift function 



D {1) {x) = x -7 log 



(15c) 



Thus, a stochastic process with a nonlinear drift term 
is discussed, that is driven by multiplicative dynamical 
noise. A feasible path of this process can be obtained by 
numerical integration of the associated stochastic differ- 
ential equation. A sample graph of the process is exhib- 
ited in figure 1. Thereby, Ito's interpretation of stochastic 
differential equations (sde) was applied. For the detailed 
properties of drift and diffusion processes we refer to [3] . 

For any one-dimensional process, the diffusion term 
significantly can be simplified by means of a nonlinear 
transformation of the state variable: For 



X 

y = y{x) = j dx' 



D 



D( 2 )(x') ' 



(16) 



V. EXAMPLE 

In this section, the performance of the minimization 
procedure is discussed by means of an example, that can 
be treated analytically. Further examples of the mini- 
mization procedure are investigated in [1, 17] for numer- 
ical and experimental data, respectively. 

We would like to address a drift and diffusion process 
in one dimension with the diffusion term 



D^(x) = Dx 2 



(15a) 



the drift and diffusion functions transform to [3] 



D 



DM(x(y)) 



D (2) (y) = d . 



(17a) 



(17b) 



In the present case, the accordant nonlinear transforma- 
tion and the transformed drift and diffusion functions 
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are: 



the minimization of 



y 

D {1) (y) 
D (2) (y) 



log(x) 

-72/ 
D . 



(18a) 
(18b) 
(18c) 



Hence, the process is equivalent to an Ornstein- 
Uhlcnbeck process in the transformed variable y. For 
this process, the conditional transition pdfs can be de- 
rived for finite time increment r [3]: 



p(y\yo,i,D) 



2 „R (1 - e-27T) 



(19) 



x exp 



(y-e 7T y ) 
2^(l-e- 2 ^) 



Let us now consider the determination of the intrinsic 
parameters 7 and D from time series data. Imagine, 
the conditional pdfs p(x\xq, 70, -Do) are known from time 
series data for a specific set of parameters (70, -Do); that 
was applied for numerical integration of the sdc. This 
initial set will now be reconstructed by means of the most 
likely set (7, -D). Following the argument of section III, 



00 00 

K(j,D) = J dx j dx p(x,x \-f ,Do) 



(20) 



x log 



p(x\x ,jo,D ) 



P{x\x 0ll ,D) 



is sufficient for this purpose. 

This expression can be calculated by means of the un- 
derlying Ornstein-Uhlenbeck process, 



K(n,D) = 



x log 



dy J dy p(y,y \j ,D ) (21) 

— OO 

p(y\yo, 10, Do) ' 



P(y\yon,D) 



As a first step, the logarithm can be evaluated for 
the specific conditional transition pdfs of the Orstein- 
Uhlenbeck process under consideration, (19). It turns 
out, that the if (7, D) solely is determined by the second 
order moments of y and yo'- 



K(-y,D) 




2^ (1 - e- 2 ^) 2-2^ (1 - e -27or) 



-7T 



-701" 



y-.yo 



7 v 



+ 



(y 2 o) v . 
1 



IJO 



-27T 



2f (l-e^r) 



Do n _ e -2 70 r) J + 2 ^ 
70 v ' / 



f (1-e 2 ^) 

Do n _ e 2 70 T) 
L 70 v ' 



-270T 



2^ (1 

70 v 



-2 7o r) 



(22) 



Finally, a closed form for the function if (7, D) can be 
derived: 



if( 7 ,£>) = xlog 



(1 - e 2 ^) 



Do 
70 



£0 (1_ 

70 v 



e 2 7 or) 

— 2e( 7 ~ 7 °) 



(23) 



7 



2(e 2 ^ - 1) 



For the initial set (70, Do) = (0.75, 1.25), this function 
is exhibited in figure 2. A distinct minimum at (7, D) = 
(0.75, 1.25) is evident, that complies with the initial set of 
parameters. In case of an application of the minimization 
procedure to real measurements, this minimum would 
have to be approached by means of gradient methods or 
advanced minimization algorithms [16]. 



VI. CONCLUSION 

In conclusion, the likelihood functions of stochastic 
processes have been derived for two specific cases. First, 



ensembles of measurements on these processes were con- 
sidered. In this connection, the iterative procedure pro- 
posed in [1] has been approved and physically motivated. 

Moreover, the maximum likelihood approach has been 
adapted to the needs of non-linear time series analysis. 
For the case of Markovian processes, an integral form of 
the estimator has been derived. A slight simplification 
of this estimator, equation (13), is purely based on two 
point conditional pdfs, that can be calculated numeri- 
cally from the Fokkcr-Planck equation in case of drift 
and diffusion processes. The integral form of the esti- 
mator allows for the reduction of huge datasets to their 
conditional transition pdfs prior to the iterative analysis 
procedure. 

Finally, the meaning of the optimal set of parameters, 
A, that is obtained by application of the method de- 
scribed in [1] , has been made explicit on the basis of the 
maximum likelihood approach: It is the most likely set 
of parameters with respect to the current parametriza- 
tion. As a consequence, the proposed procedure can be 
applied even to time series that suffer from sparse data 
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points and that could not safely be processed by the for- 
mer methods without this knowledge. 




FIG. 2: Kullback distance between processes (70 = 0.75, Do = 
1.25) and (7, D) as function of 7 and D for r = 0.5. A distinct 
minimum at (7,-D) = (70, Do) is evident. The contour lines 
are located at z = 2 l for i — — 11,..., 0. For the sake of 
clearness, the z-axis is scaled logarithmically. 
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